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Abstract. Many natural simulation tasks, such as generating a 
partition of a large integer n, choosing uniformly over the p n con- 
ceivable choices, might be done by proposing a random object from 
a larger set, and then checking for membership in the desired target 
set. The success probability, i.e., the probability that a proposed 
random object hits the target, may be very small. 

The usual method of rare event simulation, the exponential shift 
or Cramer twist, is analogous to the saddle point method, and can 
remove the exponentially decaying part of the success probability, 
but even after this, the success probability may still be so small as 
to be an obstacle to simulation. 

To simulate random integer partitions of n, using Fristedt's 
method, the initial proposal is a partition of a random integer 
of size around n, so that the counts of parts of each size are mu- 
tually independent. The usual method corresponds to evaluating 
the generating function at exp(— w/y/fm), and the remaining small 
probability is asymptotic to (96n 3 )" 1//4 . 

Here, we propose a new method, probabilistic divide-and-conquer, 
for dealing with the small probability, e.g., the order n -3 / 4 prob- 
ability in the example of integer partitions. This method is anal- 
ogous to changing a very difficult game, in which "hole in one" is 
the only way to score, to the usual game of golf. 

There are many variations on the basic idea, including a simula- 
tion technique we call mix-and-match, with features of the coupon 
collector's problem. 

For the case of integer partitions, we have a close to ideal recur- 
sive scheme, not involving mix-and-match. The asymptotic cost 
is V2 times the cost to propose a random partition of a random 
integer of size around n, so that the algorithm is within 0(1) of 
the entropy lower bound. 
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1. Introduction. 

1.1. Exact simulation. Exact simulation methods provide samples 
from a set of objects according to some given probability distribution. 
For many combinatorial problems, the given distribution is the uniform 
choice over all possibilities of a given size. 

An important technique from von Neumann 1951, |32j . which we re- 
view in Section [2T2l is acceptance/rejection sampling, where one sam- 
ples from an easy-to-simulate distribution related to the desired distri- 
bution, and then rejects some samples. The precise recipe is to accept 
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samples with probability proportional to the ratio of the desired prob- 
ability to the probability under the easy-to-simulate distribution. The 
result is that upon acceptance, the sample is an exact sample from the 
desired distribution. 

Another important technique is Markov chain coupling from the past 
(CFTP), where one keeps track of coalescing Markov chains starting 
from some time in the past, and constructs a long enough past that 
all chains have coalesced by time zero [SO]. This is now a very lively 
subject, with over 160 papers described at 
http : / /dimacs . rutgers . edu/ ~dbwilson/ exact . html/ 

Divide-and-conquer is a basic strategy for algorithms. Notable ex- 
amples include the Cooley-Tukey fast Fourier transform, attributable 
to Gauss [15j, and Karatsuba's fast multiplication algorithm [16], which 
surprised Kolmogorov, [T7] . We note that these and other cases treated 
in textbooks on algorithms are deterministic. Randomized quicksort, 
see for example [8] Section 7.3, is the prototype of divide-and-conquer 
using randomness, but such algorithms can be thought of as a variation 
on the deterministic algorithm, applied to a permutation of the input 
data. 

In this paper, we propose a new method for exact sampling: proba- 
bilistic divide-and-conquer; here, the subdivision of the original prob- 
lem is inherently random. We prove, in a couple of general settings, 
that this method does achieve exact samples. 

Then we illustrate the use of this general method, with the target 
being to sample, for a given n, uniformly from the p n partitions A of 
the integer n. The starting point is Fristedt's construction, [T2], with 
a random integer T of size around n, such that, for a random partition 
of T, the counts of parts of various sizes are mutually independent; 
we review this in Section 14.21 The problem then is how to simulate 
efficiently, since the event T = n is a rare event, as in [TJ. 

In order of ontogeny these methods express a partition as A = (A, B), 
where 

(1) A is the (list of) large parts, say \/n up to n, B is the small 
parts, size 1 up to \/n. Mix-and-match may offer an additional 
speedup. 

(2) B is the number of parts of size 1, A is everything else. Hence 
the B side of the simulation is trivial, with no calls to a random 
number generator. Nevertheless, there is a large speedup, by a 
factor asymptotic to \/n/c, where c = tt/\/6. 





(3) 



In 



p(z) = d(z)p(z 2 ), 
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with the classical p(z) = ^2 n>0 p n z n = IL>i(l — z l )~ l , and 
d(z) = rii>i(l + 2;l ) ^° enumerate partitions with distinct parts, 
A corresponds to d(z). This method iterates beautifully, re- 
ducing the target, n, by a factor of approximately 4 per iter- 
ation, with an acceptance /reject ion cost of only roug hly 2y/2, 
improved to Vz in Section 14.5.11 We have run this on a per- 
sonal computer, with n as large as 2 49 , and relative to the ba- 
sic algorithm "waiting-to-get-lucky" , analyzed in Section |4.2[ 
this version of divide-and-conquer achieves roughly a billion- 
fold speedup 

1.2. Probabilistic divide-and-conquer, motivated by mix-and- 
match. For us, the random objects S whose simulation might benefit 
from a divide-and-conquer approach are those that can be described as 
S = (A, B), where there is some ability to simulate A and B separately. 
Specifically, we require that A G A, B G B, and that there is a function 
h : A x B — > {0, 1}, so that for a G A, b G B, h(a, b) is the indicator 
that "a and b fit together to form an acceptable s." Furthermore, we 
require that A and B be independent, and that the desired S be equal 
in distribution to ( (A, B)\h(A, B) = 1). This description, independent 
objects conditional on some event, may seem restrictive, but [51 [TT] 
shows that very broad classes — combinatorial assemblies, multisets, 
and selections — fit into this setup. 

Now imagine that one wants an honest sample of size m, that is, 
Si, S2, ■ ■ ■ , S m , to be independent, with the original S along with Si, S2, ■ 
to be identically distributed. The pedestrian approach is to propose 
sample values Ai,A 2 ,..., and Bi,B 2 , . . ., and to consider the indicators 

of aligned matching, that is, h(Ai, Bi), h{A 2 , B 2 ), One naturally 

has waiting times Ti,T 2 , . . ., with 7\ := min{i > 1 : h(Ai,Bi) = 1}, 
and for for k > 1, := min{i > T k _i : h(Ai, B^) = 1}. And of course, 
the ith. sample found is Si :— (A Te , B Tl ). 

In the case where k = T m is large, the scheme just described seems 
wasteful — we proposed k values of A G A, and k values of B G B, and 
hence might have k 2 opportunities for a match. That is, rather than 
just look for the aligned potential matches, scored by the h(Ai, Bi) for 
1 < i < k, we might have considered the h(Ai,Bj) for 1 < i,j ' < k, 
with k 2 index pairs Indeed, this situation arises naturally in bio- 

logical sequence matching, [SIS], where for two independent sequences 



1 The RandomPartition function in Mathematica® [33] appears to hit the wall 
at around n — 2 20 . 
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of iid letters, in many but not all cases for the two marginal distribu- 
tions, unrestricted rather than aligned matching effectively squares the 
number of ways to look for a match, and hence approximately doubles 
the length of the longest match found. 

Of course, the difficulty with the general program "search all k 2 
index pairs if is that conflicting matches might be found. Suppose, 
for example, that exactly two matches are found, with Ai matching 
both Bj and By , with j ^ f . It is easy to see that taking both AB 
pairs ruins the iid nature of a sample. Also, though perhaps not as 
obvious, other strategies, such as suppressing both AB pairs, or taking 
only the pair indexed by the lexicographically first of ij,ij', or taking 
only one pair, based on an additional coin toss, introduce bias relative 
to the desired distribution for S. 

There is a way to allow mix-and-match, and still get an honest sam- 
ple^; details will be given in Section [3j The first step is to use rejection 
sampling to produce a list of As, distributed as a sample from the 
distribution of As biased by the chance that they would match, if a sin- 
gle B were proposed. For us, it only became apparent later, that this 
use of rejection, even in the absence of mix-and-match, can be useful; 
Theorem [2] describes a surprising example. 

2. The basic lemma for exact sampling with 
divide- and-conquer 

We assume throughout that 

(1) A G A, B G B have given distributions, 

(2) A, B are independent, 

(3) h: Ax B ^ {0,1} 

satisfies p:=E h(A, B) G (0, 1] ,| 
where, of course, we also assume that h is measurable, and 

(4) S eAxB has distribution C(S) = £( (A, B) \ h(A, B) = 1), 

i.e., the law of S is the law of the independent pair (A, B) conditional 
on having h(A, B) = 1. 

2 under a condition on the structure of h, described by Lemma [21 
3 The requirement that p > is not needed for divide-and-conquer to be useful; 
but rather, a choice we make for the sake of simpler exposition. In cases where p = 0, 
the conditional distribution, apparently specified by (Tj| , needs further specification 
— this is known as Borel's paradox. 
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Note that a restatement of (BJ, exploiting the hypothesis (jHD, is that 
for measurable sets R C A x B, 

= F((A,B)ER a nd h(A,B) = l) ^ 

V 

or equivalently, for bounded measurable functions g from Ax B to the 
real numbers, 

Eg(S) = E(g(A,B)h(A,B))/p. 

Since we have assumed p > 0, this is elementary conditioning. This 
allows the distributions of A and B to be arbitrary: discrete, absolutely 
continuous, or otherwise. 

The following lemma is a straightforward application of Bayes' for- 
mula. 

Lemma 1. Suppose X is the random element of A with distribution 

(5) £{X) = £{A\h{A,B) = 1), 

and Y is the random element of B with conditional distribution 

(6) C(Y\X = a) = C(B\h(a,B) = 1). 

Then (X, Y) = d S , i.e. the pair (X, Y) has the same distribution as S, 
given by fll]). 

Proof. A restatement of (jSJ) is 

(7) F(X e da) = WW*a,B) 

P 

and relation is equivalent to 

(8) £((X, Y) \X = a)= £( (a, B) \ h(a, B) = l). 
Hence, for any bounded measurable g : A x B — > R, 

Eg(X,Y) = E(E(g(X,Y)\X)) 

F(Xeda) E(g(X,Y)\X = a) 

A 

F(A E da) Eh(a, B) E (g(a, B)h(a, B)) 



A p Eh(a,B) 

= - I F(A G da) E(g(a,B)h(a,B)) 

pJa 

= E(g(A,B)\h(A,B) = l)=Eg(S). 

We used jSJ) for the middle line in the display above; on the set Aq := 
{a G A : Eh(a, B) = 0}, which contributes to the integral, we took 



PROBABILISTIC DIVIDE- AND-CONQUER 



7 



the usual liberty of dividing by 0, rather than writing out separate 
expressions for the integrals over Aq and A \ Aq. 

□ 

2.1. Algorithmic implications of the basic lemma. Assume that 
one wants a sample of fixed size m from the distribution of S. That 
is, one wants to carry out a simulation that provides Si, 62, . . ., with 
Si, S%, . . . , S m being mutually independent, with each equal in distri- 
bution to S. According to Lemma dj this can be done by providing m 
independent pairs (A i; Yi), i — 1 to m, each equal in distribution to S. 

A reasonable choice of how to carry this out, not yet using mix-and- 
match, involves the following: 

Outline of an algorithm to gather sample of size m. 

Stage 1. Sample Xi, X2, . . . , X m from the distribution of X, i.e., from 

C(X) = C{A\h(A,B) = 1). 

Stage 2. Conditional, on the result of stage 1 producing (Xi, . . . , X m ) = 
), find Yi, Y 2 , . . . , Y m , mutually independent, with distribu- 
tions C(Yi) = £( B I h{a u B) = 1 ). 

Note that in general, conditional on the result of stage 1, the Yj in 
stage 2 are not identically distributed. Furthermore, the trials under- 
taken to find these Yj need not be independent of each other. 

2.2. Use of acceptance/rejection sampling. Assume that we know 
how to simulate A — this is under the distribution in where A, B 
are independent. But, we need instead to sample from an alternate 
distribution, denoted above as that of X e A. The rejection method 
recipe, for using (0), may be viewed as having 4 steps, as follows. 

(1) Find a threshold function t : A — > [0, 1], with t(a) proportional 
to Eh(a,B)/p, i.e., t of the form t(a) = C x Eh(a,B)/p for 
some positive constant C. 

(2) Propose iid samples Ai, A2, ■ ■ ., 

(3) Independently generate uniform (0,1) random variables Ui, U 2 , ■ ■ ., 

(4) If Ui < t(Ai), then accept Ai as as X value; otherwise reject it. 

The cost of using acceptance/rejection. 

Naturally, one wants to take the constant C for the threshold function 
t in step (1) to be as large as possible. This is subject to the constraint 
t(a) < 1 for all a, i.e., C x Eh(a, B)/p < 1. The expected fraction of 
proposed samples A4 to be accepted will be the average of t(a) with 
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respect to the distribution of A, i.e., 

Pace := P(U < t(A)) = Et(A) =e(cx = C, 

and the expected number of proposals needed to get each acceptance 
is the reciprocal of this, so we define 

1 1 

(9) Acceptance cost := = — . 

Pace ^ 

Assuming that we can find an a* where E h(a, B) achieves its maximum 
value, this simplifies to 

(10) Acceptance cost = I = ^-^y, 

For comparison, if we were not using divide-and-conquer, but instead 
proposing pairs (A^Bj) and hoping to get lucky, i.e., hoping that 
h(Ai, Bi) = 1, with success probability p and expected number of pro- 
posals to get one success equal to 1/p, then, ignoring the cost of propos- 
ing the Bi, the speedup involved in ( TlOl) is a factor of 1/Kh(a* , B). 

Is the threshold function t computable? In step (4), for each 
proposed value a = A iy we need to be able to compute t(a); this can 
be either a minor cost, a major cost, or an absolute impediment, mak- 
ing probabilistic divide-and-conquer infeasible. All of these variations 
occur, in the context of integer partitions, and will be discussed in 
Sections 14.31 - 14.5[ and again in Section 15.21 

2.3. Caution: mix-and-match not yet enabled. In Stage 2 of the 
2-stage algorithm described at the beginning of Section 12. 1[ there is a 
subtle issue involved in the phrase "independently find Y%, Y 2 , . . . , Y m 
• • • " . Suppose, for example, that one plans to propose iid copies of 
B, say Bi,B 2 ,..., waiting for samples that will match one of the 
ai, a 2 , • • • , a m obtained in the first stage. 

A correct way to carry out stage 2 is to consider stopping times 
1 < ai < a 2 < ■ ■ ■ < a m , with 

(11) «i := min{j > 1: h(ai,Bj) = 1}, 

and then recursively, for I = 2 to m, ae := min{j > ai-\. h(ae, Bj) = 1}. 
Here, stage 2 is completed after proposing a m copies of B, and the m 
samples of S are (ag, B at ) for £ = 1 to m. 

In general, it would be incorrect to try to speed up stage 2 by taking 
stopping times 1 < Pi < (3 2 < ■ ■ ■ < (3 m , with 

(12) f3\ := min{j > 1: h(ai, Bj) = 1 for some i G {1, 2, ... , m}}, 
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and then 02 '■= min{j > (3±: h(at, Bj) = 1 for one of the m — 1 values i 
not already matched}, and so on. [In case the B at time Pi matches 
for more than one index i, we might choose, for example, to declare the 
smallest available i to serve as the index matched, for the sake of spec- 
ifying which m — 1 indices are available in the definition of /?2- Other 
choices as to which i, even those involving auxiliary randomization, 
will have the same effect.] Say that I(£) is the index matched at time 
(3g. With the random permutation 7r defined to be the inverse of the 
permutation /(•), the sample consists of (a^, Bp.) for £ = 1 to m. Not 
only there may be dependence in the sequence Bp. x) , Bp, 2 . , . . . , Bp,. , 
it may also be the casa^ that Bp, t . fails to have the desired marginal 
distribution, i.e. that of B conditional on h(ci£,B) = 1. (Perhaps this 
B was matched to when a higher priority index i was available; the 
fact of not satisfying h(cii,B) = 1 biases the distribution.) 

3. Simple matching enables mix-and-match 

Lemma [T] was basic, and so is the following lemma, but the pair 
serves nicely to clarify the logical structure of what is needed to en- 
able probabilistic divide-and-conquer, versus what is needed to further 
enable mix-and-match. 

Lemma 2. Given h : A x B — > {0, 1}, the following two conditions are 
equivalent: 

Condition 1: Va, a' G A, b, b' G B, 

(13) 1 = h(a,b) = h(a',b) = h(a,b') implies h(a',b') = 1, 

Condition 2: 3C, and functions C4 : A — )■ C, eg : B — )■ C, so that 
V(a, b) G A x B, 

(14) h(a,b) = l(c A (a)=c B (b)). 

We think of C as a set of colors, so that condition (|14p says that a and 
b match if and only if they have the same color. 

Proof. That ([Tip implies f|T3|) is trivial. In the other direction, it is easy 
to check that f|T3|) implies that the relation ~_4 on A given by a a' 
iff 36 G B, 1 = h(a, b) = h(a', b) is an equivalence relation. Likewise for 
the relation ~ B on B given by b ~ B b' iff 3a G A, 1 = fo(a, 6) = ft(a, 6'). 
For the set of colors, C, we might take either the set of equivalence 
classes of A modulo or the set of equivalence classes of B modulo 
~s, and then f[T3"j) also provides a bijection between these two sets of 
equivalence classes, to induce CHI) . □ 



4 We thank Sheldon Ross for first observing this. 
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Remark 1. The proof of Lemmal^ with equivalence classes of A/~a 
and B/ shows that the pair of coloring functions satisfying (JHj) 
is essentially unique. Specifically, unique apart from relabeling and 
padding, i.e., an arbitrary permutation on the names of the colors used, 
and enlarging the range, C, to an arbitrary superset of the image. 



Remark 2. The statement of Lemma^ shows that coloring is not es- 
sentially an issue of sufficient statistics. After all, hypothesis (TT31) only 
concerns the logical structure of the matching function h appearing in 
(j3J), and doesn't involve the distributions on A and E appearing in ([1]). 



Remark 3. When (|14p holds, we can write the event that A matches 
B as a union indexed by the color involved: 

{h(A, B) = 1} = U keC {c A (A) = k,c B (B) = k}, 

so that p = J2k&c ^( c a(A) = k,cs(B) = k), and we see that at most a 
countable set of colors k contribute a strictly positive amount to p. As 
a notational convenience, we take N C C, and use positive integers k 
for the names of colors that have 

(15) P(cu(A) = k, c B (B) = k) > 0. 

[The fact that A, B are independent, hence F(c A (A) = k, cb{B) = k) = 
F(c A (A) = k)¥(cB(B) = k), is irrelevant to main idea behind f[T5"j) . 
However, a technique we refer to as 'roaming x ' in Section \4-^.^\ is an 



example of how to take advantage of the independence of A and B.J 



The intent of the following lemma is to show that if h satisfies (JT] 
then mix-and-match strategies can can be used in stage 2 of the broad 
outline of Section 12. 1[ 

Lemma 3. Assume that h satisfies (j!4p . Consider a procedure which 
proposes a sequence Di, D2, . . . of elements ofB with the following prop- 
erties: 

There is a sequence of a— algebras C T\ C Ti C ■ ■ • . [We think 
of J-q as carrying the information from stage 1 of an algorithm along 
the lines described in Section IKT\ carrying information such as "which 
demands ai, a 2 , . . . must be met — or reduced information, such as the 
colors c A (ai),. .. ,c A (a m ).J 

For every n > 1 and k satisying (|15p . conditional on Fn-i together 
with ce{D n ) = k, the distribution of D n is equal to C{B\c^{B) = k). 

For every k satisying (fl5l) . 

(16) with probability 1, infinitely many n have ce(D n ) = k. 
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Define stopping times t- , the "time n of the i-th instance of cs(D n ) = 



k, by r = and for i > l,7y = inf{n > t\_\ : c^(D n ) = k}. 



We write D{n) = D n , to avoid multi-level subscripting, and define 



Then, for each k, the B\ ,B\ , . . . are independent, with the distri- 
bution C{B\cq{B) = k), and as k varies, these sequences are mutually 
independent. 

Proof. The proof is a routine exercise; it suffices to check the indepen- 
dence claim for an arbitrary finite number of choices of k, restricting to 

(k) 

the B\ for 1 < i < io with an arbitrary finite %q, and this can be done, 
along with checking for the specified marginal distributions, by sum- 
ming over all possible values for the random times . Writing out the 
full argument would be notationally messy, and not interesting. □ 

4. Algorithms for simulating random partitions of n 

In this section we focus on the generation of partitions under the 
uniform distribution. Other distributions on partitions such as the 
Plancherel measure, while important in many applications, for example 
PIHC32], are not addressed in the current paper, but it is highly plausible 
that these techniques can be adapted to those measures. We note that 
the phrase "generating partitions" is usually taken, as in [19] , to mean 
the systematic listing of all partitions, perhaps subject to additional 
constraints; this is very different from simulating a random instance. 

The computational analysis that follows uses an informal adaptation 
of uniform costing; see Section 15.21 Some elements of the analysis, 
specifically asymptotics for the acceptance rate, will be given rigorously, 
for example in Theorems [T], and [31 

4.1. For baseline comparison: table methods. A natural simula- 
tion method is to find the largest part first, then the second largest 
part, and so on according to a distribution of largest partH The main 
cost associated with this method is the storage of all the distributions 
needed. Some details follow. 

Let p(< k, n) denote the number of partitions of n with each part of 
size < k, so that p(< n, n) = p n . These can be quickly calculated from 
the recurrence 



For the largest part, we are are dealing with unrestricted partitions of n, but for 
subsequent parts, the problem involves the largest part of a partition of an integer 
m into parts of size at most k. 




D(t?>) fori = 1,2,.... 



(17) 



piS fcj n ) = p(— k — l,n) + p(< k,n — k), 
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where the right hand side counts the number of partitions without any 
fc's plus the number of partitions with at least one k. 

Let Xi denote the z-th largest part of a randomly generated partition, 
so A = (Xi,X 2 , . . .)• We have 

nX 1 <j)=p(<j,n)/p n , 

F{X 2 < = j) = p{< s,n- j)/p{< j, n-j), 
and in general for i > 2 

W(Xi <ji\X! =ji,...,Xi-i =ji-i) = p(< ji,n-^2jt)/p(< ji-!,n-^2 

Rather than computing each quantity on the right hand side as it 
appears, an n by n table, where the j-th column consists of the numbers 
£>(— 1) j)>P(— 2, j), • • • ,£>(< ti, j), can be computed and stored, hence 
generating partitions is extremely fast once this table has been created. 
For one uniformly distributed random number, we can look up the value 
of the largest part of a random partition; this implies order of y/n log n 
lookups. An easy variation also finds the multiplicity of the largest 
part, implying order of \fn lookups. 

The memory conditions are on the order of n 2 for the entire table, 
a severe constraint, but once it has been constructed the generation 
of random partitions is rapid@ Another variation on this method, 
based on Euler's identity np n = J2dj>idPn-dj, is given in [25j EE], 
and cited as "the recursive method." We haven't found a clearcut 
complexity analysis in the literature, although [H] comes close. But 
we believe that this algorithm is less useful than the p(< k, n) ta- 
ble method - - sampling from the distribution on (d,j) implicit in 
n Pn = J2dj>idPn-dj requires computation of partial sums; if all the 
partial sums for J2dj>i dj<m^Pm~dj with m < n are stored, the to- 
tal storage requirement is of order n 2 log n, and if they aren't stored, 
computing the values needed, on the fly, becomes a bottleneck. 



Since the largest part of a random partition is extremely unlikely to exceed 
0(\fn\ogn), one can get away using a table of size 0(n 3 / 2 log n), which will only 
rarely need to be augmented. Specifically, writing for the largest part of a 
partition, with c = 7r/v6, for fixed A > 0, with zo = io(n, A) := y/n\og(A^/n/c)/c, 
Pn.(Ai > ?'o) — > 1 — exp(— 1/A). For example, take A = 1000, so that the io by 
n table will need to be augmented with probability approximately .001. With M 
words of memory available for the table, we solve io(n, A) x n = M; for example, 
with M = 2 28 and A = 1000 we have n = 15000 = 2 17 and i = 1700, and 
increasing M to 2 37 gets us up to n — 9 x 10 6 = 2 23 , i — 15000. Instead of 
actually augmenting the table, one could treat the roughly one out of every A 
computationally difficult cases as deliberately missing data, as in Section [4.3. 31 
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4.2. Waiting to get lucky. Hardy and Ramanujan [H] proved the 
asymptotic formula, as n — > oo, 

(18) p n ~ exp(2c v / n)/(4v / 3n), where c = vr/v 7 ^ = 1.282550. 

Fristedt [12] observed that, for any choice x 6 (0, 1), if ^ = 
has the geometric distribution given by 

(19) P(Zi = k) = ¥ x (Zi = k) = (1 - x l ) (x 4 ) fc , fc = 0, 1, 2, ... , 
with Zi, Z 2 , . . . mutually independent, and T is defined by 

(20) T = Z t + 2Z 2 + ■ ■ ■ , 

then, conditional on the event (T — n), the partition A having Z{ parts 
of size i, for i = 1,2,..., is uniformly distributed over the p n possible 
partitions of 

This extremely useful observation is easily seen to be true, since for 
any nonnegative integers (c(l), c(2), . . .) with c(l) + 2c(2) + • ■ • — n, 
specifying a partition A of the integer n, 

F(Zi = a,i = l,2,...) = Y[ V(Zi = c(i)) = J] ((1 - x^x')"®) 

(21) = x c{1)+2c{2)+ - Y[(l - x') = x n Y[(l-x% 

which does not vary with the partition A of n. 

The event (T = n) is the disjoint union, over all partitions A of n, of 
the events whose probabilities are given in (j2ip . showing that 

(22) F x (T = n)= Pn x n - x'). 

If we are interested in random partitions of n, an especially effective 
choice for x, used by [T2], [28], is 

(23) x{n) = exp(—c/y/n), where c = 7r/v6. 
Under this choice, we have, as n — > oo, 

(24) -E x[n) T^l, 4^ Va Mn) T ^-; 
n n 6 ' 2, c 

this is essentially a pair of Riemann sums, see [5], page 106. If we write 
a(x) for the standard deviation of T, then the second part of f l24"l) may 
be paraphrased as, with x = x(n), as n — > oo, 

(25) a{x) ~ ^ 3/4 - 

7 We write ¥ x or P, or Zi or interchangeably, depending on whether the 

choice of x € (0, 1) needs to be emphasized, or left understood. 



14 



ARRATIA AND DESALVO 



The local central limit heuristic would thus suggest asymptotics for 
F X (T = n), and these simplify, using f[2"31 and ([21]), as follows: 

(26) p x .(T = n)~ 1 ' 



^yra(x) 2v / 6n 3 / 4 ' 

The Hardy- Ramanuj an asymptotics ffTSl) and the exact formula 
combine to show that (126]) does hold. 

Theorem 1. Analysis of Waiting-to-get-lucky. 

Consider the following algorithm to generate a random partition of 
n, chosen uniformly from the p n ~ exp(2c\/n)/(4\/3n) possibilities. 
Use the distributions in f fl9|) . with parameter x given by ( 123]) . 

(1) Propose a sample, Z\, Z 2) ■ ■ ■ , Z n ; compute T n := Z\ + 2Z 2 + 
h nZ n . 

(2) In case (T n = n), we have got lucky. Report the partition A 
with Zi parts of size i, for i = 1 to n, and stop. Otherwise, 
repeat from the beginning. 

This algorithm does produce one sample from the desired distribu- 
tion, and the expected number of proposals until we get lucky is asymp- 
totic to 2v / 6n 3 / 4 . 

Proof. It is easily seen that F x (Zi = for all i > n) — > 1. [In detail, 
P(not {Zi = for alH > n)) < £ J>n P(Z, ^ 0) = £ 4>n < £,> n x* = 
x n /(l — x) ~ x n /(c/y/n) = exp(—cy/n)y/n/c —> 0.] Hence the asymp- 
totics ( |26j) . given for the infinite sum T, also serve for the finite sum T n , 
in which the number of summands, along with the parameter x = x(n), 
varies with n. □ 

Remark 4. We are not claiming that the running time of the algorithm 
grows like n 3//4 , but only that the number of proposals needed to get one 
acceptable sample grows like n 3//4 . The time to propose a sample also 
grows with n. Assigning cost 1 to each call to the random number 
generator, with all other operations being free, the cost to propose one 
sample grows like y/n rather than n; details in Section l57R Combining 
with Theorem d the cost of the waiting-to-get-lucky algorithm grows 
like n 5 / 4 . A simple Matlab® program to carry out waiting-to-get lucky 
is presented in Section [3T71 

4.3. Divide-and-conquer, by small versus large. The waiting-to- 
get-lucky strategy is limited primarily by the probability that the target 
is hit, which diminishes like n~ 3//4 . Already at n = 10 8 , the probability 
is one in a million. Instead of trying to hit a hole in one, we allow 
approach shots. 
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Recall that to sample partitions uniformly, based on ( jl~9l - ( |26l) . our 
goal is to sample from the distribution of (Z x , Z 2 , . . . , Z n ) conditional 
on T = n. Using x = x(n) from (|23|) . and with b G {1, 2, . . . , n — 1}, 
we let 

(27) A — (Zb+i, Zb+2, ■ ■ ■ , Z n ), B = (Zi, . . . , Z b ). 

Motivated by the standard paradigm for deterministic divide-and-conquer, 
that the two tasks should be roughly equal in difficulty, we will take 
b = 0{^Jn), having observed that the expected largest part of a random 
partition grows like \fri log n. With 



(28) T A = iZ i> T B = Y^%Zi, 

i=b+l i=l 

we want to sample from (A, B) conditional on Ta + Tb = n. We have 
A, B independent, and we use h(A, B) = 1(Xa + Tb = n). The divide- 
and-conquer strategy, according to Lemma [H is to sample X from 
the distribution of A biased by the probability that an independently 
chosen B will will make a match, and then, having observed A = a, 
sampling Y from the distribution of B conditional on h(a, B) = 1. 

In order to simulate X, we will use rejection sampling, as reviewed 
in Section [2721 To find the optimal rejection probabilities, we want the 
largest C such that 

¥(T B = n-j) 
C max ; ; — < 1, 



or equivalently, 

(29) C 



P(T n = n) 

HT n = n) 
maxj P(Tb = j) 



The values P(Tg = k) for k = 0,1, ... ,n can be simultaneously com- 
puted, using the recursion ( TTT|) ; what we really have is a variant of 
the n by n table method of Section 14.14 m which the table is b by n, 
so the computation time is bn; furthermore, one only needs to store 
the current and previous row, (or with overwriting, only the current 
row), so the storage is n. Once we have the last row of the b by n 
table, we can easily find C and indeed the entire threshold function 
t. The time to compute the table, bn, would have been the much 
larger (n — b)n had we interchanged the roles of A and B, i.e., taken 
B := {Z h+X , Z b+2 , Z n ) instead of B := [Z x , Z b ). 
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In order to simulate Y = (B\Z\+2Z 2 +- ■ -+bZb = n—k), in situations 
where b is too large to store a b by n table, we resort to waiting-to-get- 
luckyH Our goal for the next section is to explore mix-and-match using 
integer partitions as an example, regardless of whether there exists a 
better competing algorithm for integer partitions. 

4.3.1. Divide- and- conquer with mix-and-match. When a sample of size 
m > 1 is desired, Lemmas [T] - [3] can be used to generate unbiased 
samples. Observe that the matching function with h(A,B) = 1(Ta + 
Tb — n), for (A, B) e Ax B, satisfies condition 2 of Lemma [2j with 
c_a(A) = Ta and cb{B) = n—Ts- Hence mix-and-match is enabled. The 
first phase of the algorithm, phase A, generates samples A ly A 2 , . . . , A m 
from X, according to Lemma [TJ This creates a multiset of m colors, 
{ci, . . . , c m }, where q = c^(Ai). We think of these as m demands that 
must be met by phase B of the algorithm. One strategy for phase B is 
to generate an iid sequence of samples of B; initially, for each sample, 
we compute its color c = c&(B) and check whether c is in the multiset 
of demands {ci, . . . , c m }; when we find a match, we pair B with one 
of the Ai of the matching color, to produce our first sample, which we 
set as Si = (Ai,B). Then we reduce the multiset of demands by one 
element, and iterate, until all m demands have been met. Lemma [3] 
implies that the resulting list (Si, S 2 , . . . , S m ) is an iid sample of m 
values of S, as desired. 

Remark 5. Note that in the above, if the first match found, (Ai,B), is 
labelled as S±, and the second matching pair is labelled S2, and so on, 
then the resulting list (Si, S 2 , ■ ■ ■ , S m ) is not necessarily an iid sample 
of S; the colors c with P(c = cb(B)) large would tend to show up earlier 
in the list. 

4.3.2. Roaming x. Consider again a sample of size m — 1. Having 
accepted X = (Zb + i, . . . , Z n ) with color k = Ta, in the notation of 
fl28|) . we now need Y, which is B = (Z\, . . . , Zb) conditional on having 
color k, which simplifies to having n — k = Tb '■— Y^i=i^i- ne 
obvious strategy is to sample B repeatedly, until getting lucky. The 
distribution of B is specified by f|T9|) and fl23l) - - with a choice of 
parameter, x = x(n), not taking into account the values of b and 
n — k. A computation similar to fl2~T|) shows that the distribution of 
(Z\(y), . . . , Zb(y)) conditional on J2i=i iZi(y) = n — k is the same, for 
all choices y G (0, 1). 

8 Of course, instead of trying to get lucky all at once, one might apply divide- 
and-conquer to this B task. But we do not pursue this particular iteration, in light 
of a better iteration scheme, presented in Section 14.51 



PROBABILISTIC DIVIDE- AND-CONQUER 



17 



As observed in [5], the y which maximizes P^Tb = n — k, i.e., gives us 
the best chance of getting lucky, is the solution of n—k = J2i=i ^ ^ (v) ■ 
Thus, in the case m — 1, the optimal choice of y is easily prescribed. 
However, for large m we also recommend using mix-and-match, which 
brings into play a complicated coupon collector's situation. With a 
multiset of demands {ci, . . . , c m } from Section 14.3.1^ the algorithm de- 
signer has many choices of global strategy; it is not obvious whether or 
not a greedy strategy — picking y for the next proposed B, to max- 
imize the chance that B satisfied at least one of the demands — is 
optimal. 

4.3.3. Deliberately missing data. For motivation: in the B phase of 
mix-and-match, one situation would have all m demands ci, C2, . . . , c m 
distinct, and in addition, F(cb{B) = q) relatively constant as % varies. 
In this situation,we have the classic coupon collector's problem; with 
time m log m to collect all m coupons. There is a harmonic slowdown: 
for example, the first match is found 100 times as quickly as the match 
after ninetynine percent of the demands have been met; the last v 
matches are expected to take about (1 + 1/2 + ■ ■ • + 1/v) / \ogm as 
long as the first m — v matches combined. In the situation with the 
F(cb{B) = q) relatively constant, but with large multiplicities in the 
multiset {ci, . . . , c m }, there is an even more dramatic slowdown near 
the end, based on the size of the set underlying the multiset of remain- 
ing demands. Note however, the values F(cb{B) = q) might not be 
relatively constant, even if we adjust the sampling distributions of B to 
fit the particular colors remaining to be found, as described in Section 

Suppose we stop before completing the B phase, with some v of 
demands remaining to be met. There is usable information in the 
partially completed sample, which has the m — v partitions found so 
far. This list of m — v is not a sample of size m — v, since sample 
requires iid from the original target distribution. But, had we run the 
B phase to completion, we would have had an honest sample of size m, 
so there is information in knowing all but v of the m values. Think of 
this sample of size m as the sample, with some v of its elements being 
unknown. For estimates based on the sample proportion, an error 
of size at most v/m is introduced by the unknown elements. Since 
the standard deviation, due to sample variability, decays roughly like 
1 / y/m, it makes sense to allow v comparable to \fm. 

For example, Pittel [29] proves that, as n — > oo, given two parti- 
tions of n, choosing (A,/i) uniformly over the {p n ) 2 possible pairs, the 
probability n n that A dominates /x satisfies rr n — >■ 0. Also, he proves 
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that, choosing a single partition A uniformly over the p n possibilities, 
the probability 7r* that A dominates its dual A' satisfies 7r* — > 0. It 
is natural to guess that lim7r n /7r* exists, with a value in [0, oo]; one 
can use simulations to suggest whether 0, 1, or oo is the most plausible 
value. The harder task is to simulate for 7t n . 

If one has an honest sample of 2m partitions of n, with v missing 
items due to terminating the B phase early, then one would have an 
honest sample of m — V pairs (A,/i), with random variable V < v f] If 
we had v = 0, and H of the pairs have A dominates fi, then the point 
estimate for p := 7i n is H/m, and the standard (1 — a)% confidence 
interval is 



H 

m 



p(l-p) H p(l-p) 

Z a/2\ , r Z a /2 



m 



rn 



rn 



With V missing pairs, consider the count K , how many of the m — V 
completed pairs had A dominates \i. We can do a worst-case analysis by 
assuming, on one side, that all V of the missing pairs have domination, 
and on the other side, that none of the V missing pairs have domination; 
more succinctly, H e [K, K + V}. Hence the confidence interval 



rn 



K /p(l-p) K + V 

— Z a/2\ , 1" Za/2 



in 



in 



P(l ~p) 



rn 



is at least a (1 — a)% confidence interval, for the procedure with delib- 
erately missing dat?^ 



4.4. Divide-and-conquer with a trivial second half. Ignoring the 
paradigm that divide-and-conquer should balance its tasks, in (127)1 a 
very useful choice is b = 1. Loosely speaking, it reduces the cost of 
waiting-to-get lucky from order n 3 / 4 to order n 1//4 . 

The analysis of the speedup, relative to waiting-to-get-lucky, is easy. 



The pairing on {1, 2, ... , 2m} must be assigned before observing which v items 
are missing. Pairing up the missing partitions, in order to get pu/2] missing pairs, 
in not valid; see Remark [5l 

10 We feel compelled to speculate that many users of "confidence intervals" don't 
really care about the confidence, nor the width of the interval, but really rely on 
the center of the interval, which in the standard case is an unbiased estimator. 
Our interval has center (K + V/2) /m, and this value is not an unbiased estimator; 
indeed, anything based on K, including the natural K/(m — V), is biased in an 
unknowable way. 
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Theorem 2. The speedup of the 6=1 procedure above, relative to 
the waiting-to-get-lucky algorithm described in Theorem Ql is asymp- 
totically \/n/c, with c = n/y/E. Equivalently, the acceptance cost is 
asymptotically 2 n 1//4 6 3//4 /7r. 

Proof. Recall that x = e~ c ^. From (fTOI) and (I2"9~j) . the acceptance cost 
l/C is given by C = P(T = n)/max fc P(Z 1 = k) = P(T = n)/P(Zi = 
0) = P(T = n)/(l — x). The comparison algorithm, waiting-to-get- 
lucky, has acceptance cost 1/P(T = n). The ratio simplifies to 1/(1 — 
x) ~ \/n/c. □ 

To review, stage 1 is to simulate (Z 2 , Z 3 , . . . , Z n ), and accept it with 
probability proportional to the chance that Z\ = n — (2Z 2 + • ■ ■ + 
nZ n ); the speedup comes from the brilliant idea in [32]. In contrast, 
waiting-to-get-lucky can be viewed as simulating (Z 2 , Z 3 , . . . , Z n ) and 
then simulating Z\ to see whether or not Z\ = n — (2Z 2 + • • • + nZ n ). 

4.5. Self-similar iterative divide-and-conquer: p(z) = d(z)p(z 2 ). 
The methods of Sections 14.21 - 14.41 have acceptance costs that go to 
infinity with n. We now demonstrate an iterative divide-and-conquer 
that has an asymptotically constant acceptance cost. 
A well-known result in partition theory is 

(so) P (z) = - zr 1 ={u i+ A (ll rh~) = 

where d(z) = Yl { 1 + z % is the generating function for the number of 
partitions with distinct parts, and p(z 2 ) is the generating function for 
the number of partitions where each part has an even multiplicity. This 
can of course be iterated to, for example, p(z) = d(z)d(z 2 )p(z 4 ), etc., 
and this forms the basis for a recursive algorithm. 

Recall from (1191) in Section H~2"l that each Zj = Zi(x) is geometrically 
distributed with P(Z{ >k)= x lk . The parity bit of Zj, defined by 

€i = l(Zi is odd), 

is a Bernoulli random variable q = ej(x), with 

(31) Pfe(s) = 1) = Pfe(s) = 0) = 

Furthermore, (Zi(x) — e?,)/2 is geometrically distributed, as Zi (x ) , 
again in the notation (|T9l . and [Zi{x) — ej)/2 is independent of e«. 
What we really use is the converse: with €i(x) as above, independent 
of Zi(x 2 ), the Zi(x) constructed as 

Zi(x) :=e i (x) + 2Z i (x 2 ), z = l,2,... 
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indeed has the desired geometric distribution. 

Theorem 3. The asymptotic acceptance cost for one step of the iter- 
ative divide- and- conquer algorithm using A = (ei(ar), e 2 (x), . . .), B = 
((Z^x) - ei)/2, (Z 2 (x) - e 2 )/2, . . .), is V&. 

Proof. The acceptance cost 1/C can be computed via f l29|) and fl26|) . 
with 



C 



P X (T = n) F X (T = n) TS^y 



max fc (T = 2=±) maxfe P x . 2 (T = k) 1 
4- 3 / 4 = 8~ 1/2 



n -3/4 



(n/4)- 3 / 4 



□ 



Here is an informal discussion of the full algorithm. First, propose A 
until getting acceptance, then, since the B task is to find a uniformly 
chosen partition of a smaller integer, iterate to finish up. In effect, the 
iterative algorithm is to determine the (Zi, Z 2 , . . . , Z n ) conditional on 
Z\ + 2Z 2 + ... — n, by finding the binary expansions: first the Is bits 
of all the ZjS, then the 2s bits, then the 4s bits, and so on. 

With a little more detail: to start, with A = (ei(x), €2(2;), • • •) and 
T4 = J2i^ e i^ we have ET4 = J2i=i jf^ ~ an d it can be shown 
that, even after conditioning on acceptance, the distribution of Ta is 
concentrated around n/2. Since B = ((^(x)— ei)/2, (Z 2 (x)— e 2 )/2, . . .) 
is equal in distribution to (Zi(x 2 ), Z 2 (x 2 ), . . .), and target n' — (n — 
Ta)/2, we see that the B phase is to find a partition of the integer n', 
uniform over the p n t possibilities. In carrying out the B task we simply 
use x{n') as the parameter, but the choice (x(n)) 2 would also work. 

4.5.1. Exploiting a parity constraint. Theorem |3] states that the as- 
ymptotic acceptance cost for proposals of A = (ei(x), e 2 (x), . . .) is 
2 a/2, and this already takes into account an obvious lower bound of 
2, since the parity of Ta = ei + £2 + ■ • • + e„ is nearly equally dis- 
tributed over {odd,even}, and rejection is guaranteed if Ta does not 
have the same parity as n. An additional speedup is attainable by 
moving ei from the A side to the B side: instead of simulating ei, there 
now will be a trivial task, just as there was Z\ in the "6 = 1" proce- 
dure of Section 14.41 That is, we switch to A = (e 2 (x), e 3 (x), . . .) and 
B = (ei(x), (Zi(x) — ei)/2, (Z 2 (x) — e 2 )/2, . . .); the parity of the new 
Ta dictates, deterministically, the value of the first component of B, 
under the conditioning on h(a, B) = 1. The rejection probabilities for 
a proposed A are like those in Theorem [3j but with an additional factor 
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of 1/(1 + or x/(l + X 1 ), depending on the parity of n + 62 + • ■ • + e n . 
Since x = x(n) — > 1 as n — > oo, these two factors both tend to 1/2, so 
the constant C as determined by (129|) becomes, asymptotically, twice 
as large. 

Theorem 4. 27ie asymptotic acceptance cost for one step of the it- 
erative divide- and- conquer algorithm using A = (t2{x), e^x), . . .) and 
B = (e x (x), {Z x {x) - eO/2, (Z 2 (z) - e 2 )/2, . . .) is y/2. 

Proof. The acceptance cost 1/C can be computed, as in the proof of 
Theorem [3l with the only change being that in the display for comput- 
ing C, the expression under the max^, which was F(T(x 2 ) = (n — k)/2) 
changes to 



P(2|n-fc + ei(a;) and T(x 



2\ 



n — k 



P(2|n -k + ei(x)) x P ( T(x 2 ) = ~ ip ("t(x 2 ) = 



□ 



4.5.2. The overall cost of the main problem and all its subproblems. 
Informally, for the algorithm in the previous section, the main prob- 
lem has size n and acceptance cost v^2, applied to a proposal cost 
asymptotic to Coy/n, for a net cost \/2cQy/n. The first subproblem has 
random size, concentrated around n/4, and hence half the cost of the 
main problem. The sum of a geometric series with ratio 1/2 is twice 
the first term, so the net cost of the main problem and all subproblems 
combined is 2\/2coy/n. 

In framing a Theorem to describe this, we try to allow for a variety 
of costing schemes. We believe that the first sentence in the hypotheses 
of Theorem [5] is valid, with 6 = 1/2, for the scheme of Remark HI The 
second sentence, about costs of tasks other than proposals, is trivially 
true for the scheme of Remark HI but may indeed be false, in costing 
schemes which assign a cost to memory allocation, and communication. 

Theorem 5. Assume that the costC(n) to propose the A = (e^x), e$(x), . . .) 



in the first step of the algorithm of Section 4-5.1, is given by a de- 
terministic function with C(n) ~ c^n 9 for some positive constant Cq 
and constant 9 > 1/2, or even more generally, C(n) = n e times a 
slowly varying function of n. Assume that the cost of all steps of the 
algorithm, other than making proposals]^ is relatively negligible, i.e., 



11 such as the arithmetic to compute acceptance/rejection thresholds, the gener- 
ation of the random numbers used in those acceptance/rejection steps, and merging 
the accepted proposals in to a single partition 
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o{C{n)). Then, the asymptotic cost of the entire algorithm is 
^-±^V2C(n)<2V2C(n). 

Proof. The key place to be careful is in the distinction between the 
distribution of a proposed A = (e 2 (x), e 3 (x), . . .), and the distribu- 
tion after rejection/acceptance. For proposals, in which the are 
mutually independent, with T A : = ^2^ e «( x )> with x = x{n) from 
(J23D, calculation gives ET A ~ n/2 and Var(T A ) ~ (l/c)n 3 / 2 . Cheby- 
shev's inequality for being at least k standard deviations away from 
the mean, to be used with k = k(n) = o(n l l 4 ), and k — » oo, gives 
F(\T A -ET A \ > kSD(T A )) < 1/k 2 . 

Now consider the good event G that a proposed A is accepted; condi- 
tional on G, the q are no longer mutually independent. But the upper 
bound from Chebyshev is robust, with P(|Xa — KT a \ > k SD(T A )\G) < 
l/(k 2 F(G)). Since F(G) is bounded away from zero, by Theorem HJ 
we still have an upper bound which tends to zero, and shows that 
(n — T A )/2, divided by n, converges in probability to 1/4. 

Write Ni = Ni(n) for the random size of the subproblem at stage 
i, starting from No(n) = n. The previous paragraph showed that 
for i = 0, Ni + i(n) / Ni(n) — > 1/4, where the convergence is conver- 
gence in probability, and the result extends automatically to each fixed 
i = 0,1,2,.... We have deterministically that N i+ i/Ni < 1/2, so in 
particular Ni > implies N i+1 < N. Set C(0) = 0, redefining this 
value if needed, so that the costs of all proposals is exactly the random 

i>0 

It is then routine analysis to use the hypothesis that C(n) is regularly 
varying, to conclude that S(n)/C(n) — > 1/(1 — 4~ e ), where again, the 
convergence is convergence in probability. The deterministic bound 
N + i(n) / N(n) < 1/2 implies that the random variables S(n)/C(n) are 
bounded, so it also follows that ES(n)/C(n) -)■ 1/(1 - A~ e ). □ 

4.5.3. A variation based on p(z) = p dd(z) p(z 2 ) . With 

Pom{z):= J] (I-**)" 1 , 
i odd 

Euler's identity d(z) = p dd(z) suggests a variation on the algorithm 
of section I4T51 It is arguable, whether the original algorithm, based on 
p(z) = d(z)p(z 2 ), and the variant, based on p(z) = p dd(z) p(z 2 ) , are 
genuinely different. 
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Arguing the variant algorithm is different: the initial proposal is 
A = (Zi(x), Z 3 (x), Z 5 (x), . . .). Upon acceptance, we have determined 
(Ci(A), C 3 (A), C 5 (A), . . .), where A is the partition of n that the full 
iterative algorithm will determine, and Cj(A) is the number of parts 
of size i in A. The B task will find (C 2 (A), C 4 (A), C 6 (A), . . .) by iterat- 
ing the divide-and-conquer idea, so that the second time through the 
A procedure determines (C^A), Cq(\), Cio(A), . . .), and the third time 
through the A procedure determines (C 4 (A), Ci 2 (A), C 2 o(A), . . .), and 
so on. 

Arguing that the variant algorithm is essentially the same: just as 
in Euler's bijective proof that p dd(z) = d(z), the original algorithm 
had a proposal A = (ei(x), €2(2), . . .), which can be used to construct 
the proposal (Zi(x), Z 3 (x), Zs(x), . . .) for the variant algorithm. That 
is, one can check that starting with independent e(i,x) = ei(x) given 
by (j5Ij), for j = 1,3,5,..., Z 5 := £ m > e (i 2 m ; x ) 2 ™ indeed has the 
distribution of Zj(x) specified by (fl9j) . with Zi,Z 3 ,... independent. 
And conversely, one can check that starting with the independent geo- 
metrically distributed Zi(x), Z 3 (x), . . ., taking base 2 expansions yields 
mutually independent ei(x), 62(2:), . . . with the Bernoulli distributions 
specified by (pJIj) . Hence one con/c? program the two algorithms so that 
they are coupled: starting with the same seed, they would produce the 
same sequence of colors T4 for the initial proposal, and the same count 
of rejections before the acceptance for the first time through the A 
procedure, with same Ta for that first acceptance, and so on, including 
the same number of iterations before finishing. Under this coupling, 
the original algorithm produces a partition \i of n, the variant algo- 
rithm produces a partition A of n — and we have implicitly defined the 
deterministic bijection / with A = /(//). 

Back to arguing that the algorithms are different: we believe that 
the coupling described in the preceding paragraph supplies rigorous 
proofs for the analogs of Theorems [3] and HI For Theorem however, 
one should also consider the computational cost of Euler's bijection, 
for various costing schemes, and we propose the following analog, for 
the variant based on p(z) = p dd(z) p{z 2 ) , combined with the trick of 
moving ei(x) from the A side to the B side, as in Section [4.5. It 



Theorem 6. Assume that the cost D{n) to propose (Zi(x), Z 2 (x), . . . , Z n ( 
with x = x(n), satisfies D(n) = n e times a slowly varying function 
of n. Assume also that the cost Da(ti) to propose A = (Zi(x) — 
ei(x), Z 3 (x), Z 5 (x), . . .) satisfies D A {n) ~ D(n)/2. 
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Then, the asymptotic cost of the entire algorithm is 
1 _\ /4 f ) V2D(n)/2<V2D(n). 

Proof. Essentially the same as the proof of Theorem |5j □ 

It is plausible that the cost function C(n) from Theorem [5] and the 
the cost function D(n) from Theorem [6] are related by C(ri) ~ D(n); 
note that this depends on the choice of costing scheme, essentially 
asking whether or not the algorithmic cost of carrying out Euler's odd- 
distinct bijection is negligible. 

Another argument that the two algorithms are different arises from 
the completely artificial example at the end of Section 15.41 

5. For integer partitions: review, method of choice 

In Section HJ five methods were presented for the simulation of par- 
titions of the integer n. Now we review the costs of running these 
algorithms, taking into account the size of n, the number m of samples 
to be delivered, and so on. We also consider alternate simulation tasks 
involving integer partitions with restrictions on the parts to see how 
the methods adapt. 

5.1. Method of choice for unrestricted partitions. If one is in- 
terested in generating just a few partitions of a moderately sized n, 
then the waiting-to-get-lucky method, with a dumb "time n" method 
of proposal for (Zi, . . . , Z n ), is very easy to program. The overall run- 
time is order of n 7//4 — a factor of n to make a proposal, and a factor 
of n 3//4 for waiting to get luckyF 2 ] For example, in Matlab® [23] 

n=100; logx=-sqrt(6*n)/pi; s=0; 
while s~=n, 

Z=floor(log(rand(n,l)) ./(l:n) ' .*logx) ; 

s=(l:n)*Z; 

end 

runs on a common desktop computeiEl at around 600 partitions of 
n = 100 per second; with n = 1000 the same runs at about 20 partitions 
per second, and at n = 10, 000 takes about 2 seconds per partition. 

12 A smarter "time y/n" method of proposal for (Zi, . . . , Z n ) is described in 
Secton 15.21 It is harder to program, but gets the overall runtime down to order of 
n 5 / 4 . 

13 Macintosh iMac, 3.06 Ghz, 4 GB RAM 
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The table method is by far the fastest method, if one is interested 
in generating many samples, and the table of size n 2 floating point 
numbers fits in random access memory. For example, for n = 10, 000 
the same computer as above takes 5 seconds to generate the table — a 
one time cost, and then finds 40 partitions per second. At n — 15, 000, 
the same computer takes 28 seconds to generate the table, and then 
finds 25 partitions per second. But at n = 19, 000, the computer 
freezes, as too much memory was requested. 

The divide-and-conquer methods of Sections 14.31 and I4.4[ using the 
small versus large division of (1271) . offer a large speedup over waiting- 
to-get-lucky, but only for case b = 1, with its trivial second half, can 
we analyze the speedup — the y/n/ c factor in Theorem [2j 

The divide-and-conquer method based on p(z) = d(z)p(z 2 ) is un- 
beatable for large n. Regardless of the manner of costing, be it only 
counting random bits used, or uniform costing, or logarithmic (bitop) 
costing, the cost to find a random partition of n must be asymptoti- 
cally at least as large as the time to propose (Zi, . . . , Z n ) for a random 
partition of a random number around n. The entire divide-and-conquer 
algorithm of Theorem EJ compared with just proposing (Zi, . . . , Z n ), 
has asymptotically an extra cost factor of \/2. So the claim of un- 
beatable at the start of this paragraph really means: asymptotically 
unbeatable by anything more than a factor of \[2. 

5.2. Complexity considerations. At the end of Section I2T21 we note 
that in the general view of probabilistic divide-and-conquer algorithms, 
a key consideration is computability of the acceptance threshold t{a). 
The case of integer partitions, using any of the divide-and-conquer algo- 
rithms of Section I4.5[ is perhaps exceptionally easy, in that computing 
the acceptance threshold is essentially the same as evaluating p m , an 
extremely well-studied task. For m > 10 4 a single term of the Hardy- 
Ramanujan asymptotic series suffices to evaluate p m with relative error 



less than 10 16 ; see Lehmer 





and numerical tabulationEl, together with Lehmer's guarantee, shows 
that 



lnp n - In /in (n) | < 10~ 16 for all n > 810. 



We thank David Moews for bringing these papers to our attention, 
'done in Mathematica® 8. 
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Is floating point accuracy sufficient, in the context of computing an 
acceptance threshold t(a)? There is a very concrete answer, based on 
[20] . First, as in Lemma 7.14 in [2], given p G (0, 1), a p-coin can be 
tossed using a random number of fair coin tosses; the expected number 
is exactly 2, unless p is a kth level dyadic rational, i.e., p = i/2 k with 
odd i, in which case the expected number is 2 — 2 1_fc . The proof is by 
consideration of say B, B x , B 2 , . . . iid with F(B = 0) = F(B = 1) = 1/2; 
after r tosses we have determined the first r bits of the binary expansion 
of a random number U which is uniformly distributed in (0, 1), and the 
usual simulation recipe is that a p-coin is the indicator 1(U < p). 
Unless |_2 r p\ = \_2 r U\, the first r fair coin tosses will have determined 
the value of the p-coin. Exchanging the roles of U and p, we see that 
number of bits of precision read off of p is, on average, 2, and exceeds 
r with probability 2~ r . If a floating point number delivers 50 bits of 
precision, the chance of needing more precision is 2~ 50 , per evaluation 
of an indicator of the form 1(U < p). Our divide-and-conquer doesn't 
require very many acceptance/rejection decisions; for example, with 
n = 2 60 , there are about 30 iterations of the algorithm in Theorem 
HI each involving on average about \/2 acceptance/rejection decisions, 
according to Theorem[3l So one might program the algorithm to deliver 
exact results; most of the time determining acceptance thresholds p = 
t(a) in floating point arithmetic, but keeping track of whether more 
bits of p are needed. On the unlikely event, of probability around 
30 x v / 2/2 50 < 4 x 10~ 14 , that more precision is needed, the program 
demands a more accurate calculation of t(a). This would be far more 
efficient than using extended integer arithmetic to calculate values of 
p n exactly. 

Another place to consider the use of floating point arithmetic is in 
proposing the vector (Zi(x) , . . . , Z n (x)) . If one call to the random 
number generator suffices to find the next arrival in a rate 1 Poisson 
process, we have an algorithm using 0(y/n) calls, which can propose 
the entire vector (Zi, Z%, . . .), using x = x{n) from fl23|) . The proposal 
algorithm is based on a compound Poisson representation of geometric 
distributions, and is similar to a coupling used in [3], Section 3.4.1. 
The supporting calculation here is that, with x = x(n) = exp(— c/^/n) 
and c = it/ V6, 

^ x %3 sr^ 1 jx J . vr 2 x r 

»,J'>1 J j J 

The algorithm constructs the rate 1 Poisson process on (0, s(n)), with 
the full interval partitioned into subintervals of lengths x %3 /j. With Yij 
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equal to the number of arrivals in the sub interval of length x^/j, the 
Yij are mutually independent, Poisson distributed, and Z { := 
constructs the desired mutually independent geometrically distributed 



Once again, suppose we want to guarantee exact simulation of a 
proposal (Zi(x), . . . , Z n (x)). In the compound Poisson process with 
s(n) arrivals expected, we need to assign exactly, for each arrival, say 
at a random time R, the corresponding index such the partial 

sum for s(n) up to, but excluding the ij term, is less than R, but the 
partial sum, including the ij term, is greater than or equal to R. Based 
on an entropy result from Knuth-Yao [20], a crucial quantity is 



An exact simulation of the Poisson process, assigning ij labels to each 
arrival, can be done3 with 0(s(n) + h(n)) genuine random bits, and 
the bounds for s(n) and h(n) show that this is 0{\/n). 

The costing scheme which counts only the expected number of ran- 
dom bits needed is clearly inadequate. Consider the impractical algo- 
rithm: list all p n partitions, for example in lexicographic order. Use 
[log 2 p n ~\ random bits to choose an integer / uniformly from {1,2 . . . ,p n } 
Report the Jth partition of n. If one costs only by the number of ran- 
dom bits needed, the algorithm just described is strictly unbeatable! 

5.3. Partitions with restrictions. As with unrestricted partitions, 
if n is moderate and a recursive formula exists, analogous to that of 
Section 14.14 then the table method is the most rapid, and divide-and- 
conquer is not needed. However, the requirement of random access 
storage of size n 2 is a severe limitation. 

The self-similar iterative divide-and-conquer method of Section 1431 is 
nearly unbeatable for large n, for ordinary partitions. There are many 
classes of partitions with restrictions that iterate nicely, and should be 
susceptible to a corresponding iterative divide-and-conquer algorithm. 
Some of these classes, with their self-similar divisions, are 

(1) distinct parts, d(z) = d dd(z)d(z 2 ); 

16 on each interval (m — l,m] for m — 1 to [~s(n)"|, perform an exact simulation 
of the number of arrivals, which is distributed according to the Poisson distribution 
with mean 1. For each arrival on (m— 1, m], there is a discrete distribution described 
by those partition points lying in (m — l,m), together with the endpoints m— 1 and 
m; calling the corresponding random variable X m , the sum of the base 2 entropies 
satisfies h(X m ) < h(n) + s(n), since each extra subdivision of one of the original 
subintervals of length x IJ / j adds at most one bit of entropy. 



Z 1 (x),Z 2 {x),. . .. 
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(2) odd parts, p odd (z) = d odd (z)p odd (z 2 ); 

(3) distinct odd parts, d odd (z) = d*(z)d odd (z 3 ). 

Here d* (z) = (l + z)(l + z 5 )(l + z 7 )(l + z 11 ) ■ ■ ■ represents distinct parts 
= ±1 mod 6. Other recurrences are discussed in [TH 123 ED], and the 
standard text on partitions pQ. 

It is not easy to come up with examples where the optimal divide- 
and-conquer is like that in Section fl~3l based on small parts versus large 
parts. One suggestion is partitions with all parts prime; there should 
be a large range of n for which table methods are ruled out by the 
memory requirement, while the n memory, bxn computational time to 
calculate rejection probabilities is not prohibitive. Another suggestion 
is partitions with a restriction on the multiplicity, for example, a part of 
size i can occur at most f(i) times — with / sufficiently complicated as 
to rule out iterative formulas such as those in the preceding parapgraph. 
Another family is, for d — 2, 3, . . ., partitions where all parts differ by 
at least d. 

5.4. An eye for gathering statistics. The underlying motivation 
for sampling a random element S, m times under some given distri- 
bution /i, might be to produce a statistic h(Si, S%, . . . , S m ) based on 
that sample. Conceivably, the deterministic function h might only 
look at some part of the data, so that there is a divide-and-conquer 
scheme in which S = (X,Y), using the notation of Lemma [U and 
h(S\, S2, ■ ■ ■ , S m ) = g(X±, X 2 , . . . , X m ) for a deterministic g. In this 
case, in the notation of Section [27T1 one need only carry out Stage 1. 

Here is a completely artificial example, motivated by the question of 
dominance, discussed in Section 14.3.31 For a partition A of n, define 
/(A) to be the partition, having distinct parts, in which % is a part of 
/(A) if and only if % has odd multiplicity as a part of A. Now suppose 
that one needs to approximate, for say n = 10 9 , 7r,P anty = P(/(A) 
dominates f(fJ>)), choosing (A, fi) uniformly over the (p n ) 2 possible pairs 
of partitions of n. 

6. For comparison: opportunistic divide-and-conquer with 

mix-and-match 

Recall the setup of (PQ) - In the third and fourth paragraphs 
of Section [L~2l we describe our starting point, a motivation stemming 
from mix-and-match, and admit that we can find no way to carry out 
"opportunistic" mix-and-match, to get a perfect sample. Here, we 
point out that nonetheless, the natural opportunistic procedure does 
supply a consistent estimator. 
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Take a deterministic design: for integers mi,m 2 > 1, let A±, . . . , A mi 
be distributed according to the distribution of A given in ([T]), and let 
£>i,..., £> m2 be distributed according to the distribution of B, with 
these rrii + m 2 sample values being mutually independent. The op- 
portunistic observations, under a take- all-you- can- get strategy, are 
all the pairs (Ai,Bj) for which h(Ai,Bj) = 1. To use these in an 
estimator, one would naturally count the available pairs, via 

(32) W = W( mi ,m 2 )= KA,Bj), 

l<i<mi ,l<j <rri2 

and for a deterministic function g : support h C A x B — > M., form the 
total score from these pairs, say 

(33) G = G( mi ,m 2 ) = g(A i ,B j )h(A i ,B j ). 

KKmi,l<j'<m2 

The natural estimator is the observed average score per pair, G/W. 
Unfortunately, this is not unbiased. However, it is consistent, i.e., as 
m 1 ,m 2 — » oo, G/W — > Eg(S) in probability. 

Theorem 7. For bounded g, we get an unbiased estimate of pILg(S) 
from Gj{m\m 2 \ Furthermore, as mi,m 2 — > oo G(mi,m 2 )/(mim 2 ) — > 
pILg(S), where the convergence is convergence in probability. 

Proof. To show unbiased: for each i, j we have, since h is an indicator, 
with Eh(Ai,Bj) = p, 

Eg(A h Bj) h(A h Bj) = E (g(A h B 3 ) \h(A h B 3 ) = 1) x p. 

According to (H]), the conditional distribution of (Ai,Bj) given h = 1 
is equal to the distribution of S, so the right side of the display above 
equals pEg(S). 

For consistency: write Xy = g(Ai, Bj) h(Ai, Bj). In the variance- 
covariance expansion of Var(G), there are m\m 2 terms cov(Xy, Xyji) 
where both i = i' and j = j', m\m 2 {m 2 — 1) terms where only i = i', 
miirrii — l)m 2 terms where only j = j', and all other terms — involving 
four independent random variables Ai, A^, Bj, Bj> — are zero. Hence 
Var(G/(mim 2 )) —> 0, and Chebyshev's inequality implies the desired 
convergence in probability. □ 

Observe that in the special case g — 1, the random variable G is the 
count W, so Theorem [7] implies that W/{mim 2 ) — > p. This shows that 
G/W is a consistent estimator of E g(S). 
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